大型的语言模型(PRELMS)正在彻底改变所有基准的自然语言处理。但是,它们的巨大尺寸对于小型实验室或移动设备上的部署而言是过分的。修剪和蒸馏等方法可减少模型尺寸,但通常保留相同的模型体系结构。相反,我们探索了蒸馏预告片中的更有效的架构,单词的持续乘法(CMOW),该构造将每个单词嵌入为矩阵,并使用矩阵乘法来编码序列。我们扩展了CMOW体系结构及其CMOW/CBOW-HYBRID变体,具有双向组件,以提供更具表现力的功能,在预绘制期间进行一般(任务无义的)蒸馏的单次表示,并提供了两种序列编码方案,可促进下游任务。句子对,例如句子相似性和自然语言推断。我们的基于矩阵的双向CMOW/CBOW-HYBRID模型在问题相似性和识别文本范围内的Distilbert具有竞争力,但仅使用参数数量的一半,并且在推理速度方面快三倍。除了情感分析任务SST-2和语言可接受性任务COLA外,我们匹配或超过ELMO的ELMO分数。但是,与以前的跨架结构蒸馏方法相比,我们证明了检测语言可接受性的分数增加了一倍。这表明基于基质的嵌入可用于将大型预赛提炼成竞争模型,并激励朝这个方向进行进一步的研究。
translated by 谷歌翻译
正在投入大量努力,将最新的分类和认可到具有极端资源限制(内存,速度和缺乏GPU支持)的边缘设备。在这里,我们演示了第一个用于声学识别的深层网络,该网络小,灵活且适合压缩,但实现了原始音频分类的最新性能。我们没有手工制作一次性解决方案,而是提出了一条通用管道,该管道通过压缩和量化自动将大型深卷积网络转换为资源破裂的边缘设备的网络。在引入ACDNET(在ESC-10(96.65%),ESC-50(87.10%),Urbansound8K(84.45%)和AudioEvent(92.57%)上产生的ACDNET之后,我们描述了压缩管道和压缩管道和AudioEvent(92.57%)证明它使我们能够降低97.22%的尺寸和减少97.28%的失败,同时保持接近最先进的准确性96.25%,83.65%,78.27%和89.69%的掉落。我们描述了对标准现成的微控制器的成功实现,除了实验室基准测试之外,还报告了对现实世界数据集的成功测试。
translated by 谷歌翻译
Deep-learning of artificial neural networks (ANNs) is creating highly functional tools that are, unfortunately, as hard to interpret as their natural counterparts. While it is possible to identify functional modules in natural brains using technologies such as fMRI, we do not have at our disposal similarly robust methods for artificial neural networks. Ideally, understanding which parts of an artificial neural network perform what function might help us to address a number of vexing problems in ANN research, such as catastrophic forgetting and overfitting. Furthermore, revealing a network's modularity could improve our trust in them by making these black boxes more transparent. Here we introduce a new information-theoretic concept that proves useful in understanding and analyzing a network's functional modularity: the relay information $I_R$. The relay information measures how much information groups of neurons that participate in a particular function (modules) relay from inputs to outputs. Combined with a greedy search algorithm, relay information can be used to {\em identify} computational modules in neural networks. We also show that the functionality of modules correlates with the amount of relay information they carry.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Local patterns play an important role in statistical physics as well as in image processing. Two-dimensional ordinal patterns were studied by Ribeiro et al. who determined permutation entropy and complexity in order to classify paintings and images of liquid crystals. Here we find that the 2 by 2 patterns of neighboring pixels come in three types. The statistics of these types, expressed by two parameters, contains the relevant information to describe and distinguish textures. The parameters are most stable and informative for isotropic structures.
translated by 谷歌翻译
Passive monitoring of acoustic or radio sources has important applications in modern convenience, public safety, and surveillance. A key task in passive monitoring is multiobject tracking (MOT). This paper presents a Bayesian method for multisensor MOT for challenging tracking problems where the object states are high-dimensional, and the measurements follow a nonlinear model. Our method is developed in the framework of factor graphs and the sum-product algorithm (SPA). The multimodal probability density functions (pdfs) provided by the SPA are effectively represented by a Gaussian mixture model (GMM). To perform the operations of the SPA in high-dimensional spaces, we make use of Particle flow (PFL). Here, particles are migrated towards regions of high likelihood based on the solution of a partial differential equation. This makes it possible to obtain good object detection and tracking performance even in challenging multisensor MOT scenarios with single sensor measurements that have a lower dimension than the object positions. We perform a numerical evaluation in a passive acoustic monitoring scenario where multiple sources are tracked in 3-D from 1-D time-difference-of-arrival (TDOA) measurements provided by pairs of hydrophones. Our numerical results demonstrate favorable detection and estimation accuracy compared to state-of-the-art reference techniques.
translated by 谷歌翻译
Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
translated by 谷歌翻译
It is well known that conservative mechanical systems exhibit local oscillatory behaviours due to their elastic and gravitational potentials, which completely characterise these periodic motions together with the inertial properties of the system. The classification of these periodic behaviours and their geometric characterisation are in an on-going secular debate, which recently led to the so-called eigenmanifold theory. The eigenmanifold characterises nonlinear oscillations as a generalisation of linear eigenspaces. With the motivation of performing periodic tasks efficiently, we use tools coming from this theory to construct an optimization problem aimed at inducing desired closed-loop oscillations through a state feedback law. We solve the constructed optimization problem via gradient-descent methods involving neural networks. Extensive simulations show the validity of the approach.
translated by 谷歌翻译
Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.
translated by 谷歌翻译
Artificial intelligence(AI) systems based on deep neural networks (DNNs) and machine learning (ML) algorithms are increasingly used to solve critical problems in bioinformatics, biomedical informatics, and precision medicine. However, complex DNN or ML models that are unavoidably opaque and perceived as black-box methods, may not be able to explain why and how they make certain decisions. Such black-box models are difficult to comprehend not only for targeted users and decision-makers but also for AI developers. Besides, in sensitive areas like healthcare, explainability and accountability are not only desirable properties of AI but also legal requirements -- especially when AI may have significant impacts on human lives. Explainable artificial intelligence (XAI) is an emerging field that aims to mitigate the opaqueness of black-box models and make it possible to interpret how AI systems make their decisions with transparency. An interpretable ML model can explain how it makes predictions and which factors affect the model's outcomes. The majority of state-of-the-art interpretable ML methods have been developed in a domain-agnostic way and originate from computer vision, automated reasoning, or even statistics. Many of these methods cannot be directly applied to bioinformatics problems, without prior customization, extension, and domain adoption. In this paper, we discuss the importance of explainability with a focus on bioinformatics. We analyse and comprehensively overview of model-specific and model-agnostic interpretable ML methods and tools. Via several case studies covering bioimaging, cancer genomics, and biomedical text mining, we show how bioinformatics research could benefit from XAI methods and how they could help improve decision fairness.
translated by 谷歌翻译